-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[improve][broker] Optimize high CPU usage when consuming from topics with ongoing txn #23189
Conversation
Same problem as #22944? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM Good catch @coderzc!
This is a quick fix, it is only effective for |
…with ongoing txn (apache#23189) (cherry picked from commit 94e1341) (cherry picked from commit b7ffa73)
…with ongoing txn (apache#23189) (cherry picked from commit 94e1341) (cherry picked from commit b7ffa73)
@coderzc This is not a complete fix. Such as this scene, maxReadPosition < readPosition < LAC, and no new message coming. With this fix, although it would not enter the loop, but then the cursor would be added to waitingCursor. Because no new message coming, the waitingCursor can not be notified, and then the consume process is stuck. Although we then commit the txn to move the maxReadPosition forward, the consume still stuck and remain some message can not be consumed. Can you help review #22944 ? |
Motivation
We found the CPU of the broker busy with calling
ManagedLedgerImpl.internalReadFromLedger
and broker checked thereadPosition > maxPosition
and then triggered thereadMoreEntries
call again leading to the broker looping to call readMoreEntries. But inManagedCursorImpl.asyncReadEntriesWithSkipOrWait
we have checked no more data to read via hasMoreEntries(). I think this case may be caused bymaxReadPosition < lastConfirmedPosition
when the topic exists ongoing Txn. So I thinkmaxPosition <= readPosition
we should not read entries immediately, instead, we delay calling read entries.Modifications
If
maxPosition < readPosition
then delayed trigger readEntries.Test Code:
CPU usage before applying this change:
![image](https://private-user-images.githubusercontent.com/26179648/358842391-fe3589ac-624b-4e49-b114-8e58fd3de526.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2MzI2NDUsIm5iZiI6MTczOTYzMjM0NSwicGF0aCI6Ii8yNjE3OTY0OC8zNTg4NDIzOTEtZmUzNTg5YWMtNjI0Yi00ZTQ5LWIxMTQtOGU1OGZkM2RlNTI2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE1VDE1MTIyNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWU2ZDdlZGRkNjcyZjAzMjExNTY3MWRjZjIyZjJhMWVmMjRmMWQ1YWFkY2RlMjQ1MDIzNTJmZTY5ZGIxMWQ4MWUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.x5qTJ2eNjrIxGkiOfgVVClUkB1v-pTN5upOeSXON7hc)
flamegraph: https://drive.google.com/file/d/1nNb4MOdbZB7mO4fWitts2UpjzutdT22O/view?usp=sharing
CPU usage after applying this change:
![image](https://private-user-images.githubusercontent.com/26179648/359419895-75dda040-952c-413b-8cbf-d5714207359b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2MzI2NDUsIm5iZiI6MTczOTYzMjM0NSwicGF0aCI6Ii8yNjE3OTY0OC8zNTk0MTk4OTUtNzVkZGEwNDAtOTUyYy00MTNiLThjYmYtZDU3MTQyMDczNTliLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE1VDE1MTIyNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWFlM2MyNWJmMGZiZWZmYTkzOGU0N2EwYTNmZmU0Y2ZjZWE0MTE3YTQwYTBjMmY4MjI0OTk0M2JkZDU1MWZmYTMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.YIqz-jAvvfoAGeqBBHvqf6erAosBo_7aPb6PmPKCIp8)
flamegraph: https://drive.google.com/file/d/1AndMJuSMXhOImf3T0hg_E7YeCyslTaNI/view?usp=sharing
Verifying this change
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: